The quest for a generic bird target to detect the presence of bird in food products and considerations for paleoprotein analysis

It can be important for consumers to know whether food products contain animal material and, if so, of which species. Food products with animal material as an ingredient often contain collagen type 1. LC-MS/MS (Liquid Chromatography–tandem Mass Spectrometry) was applied as technique to generically detect bird. Unlike for example fish, that have experienced longer divergence times, it is still possible to find generic LC-MS targets for avian type 1 collagen. After theoretical target selection using 83 collagen 1α2 bird sequences of 33 orders and construction of a common ancestor sequence of birds, experimental evidence was provided by analyzing extracts from 10 extant bird species. Two suitable options have been identified. The combination of VGPIGPAGNR and VGPIGAAGNR (pheasant only) covers all investigated birds and was not found in other species. The peptide EGPVGFpGADGR covers all investigated birds, but also occurs in several species of crocodiles and turtles. The presence of the generic peptide (combination) was confirmed in food products, proving the principle, and can therefore be used to detect the presence of bird. Furthermore, it is shown how the use of constructed ancestor sequences could benefit the field of paleoproteomics, in the interpretation of collagen MS/MS spectra of ancient species. Our theoretical analysis and assessment of reported Brachylophosaurus canadensis collagen 1α2 MS/MS data provided support for several previous peptide sequence assignments, but we also propose that our constructed ancestral bird sequence GPpGESGAVGPAGPIGSR may fit the MS/MS data better than the original assignment GLPGESGAVGPAGPpGSR.


Introduction
Collagen type 1 forms the structural and mechanical scaffolding of skin, bones, tendons, blood vessel walls, cornea and other connective tissues. It is the most abundant protein in vertebrates [1] and consists of triple helices [2]. Usually the helices are heterotrimers of two collagen 1α1 chains and one collagen 1α2 chain [3], although skin type 1 collagens from several bony fish additionally contain a 1α3 chain in the heterotrimer along with 1α1 and 1α2 [4][5][6][7]. Collagen analysis is performed in several fields, e.g. in medical research [8,9], food chemistry [10] and paleoprotein analysis [11][12][13][14]. Food products with animal material as an ingredient often contain collagen type 1, either due to its ubiquity and high abundance in (extracts from) animal tissues and/or by the intended addition of gelatin, which is partly hydrolyzed collagen often obtained from skin or bone [15,16]. For religious or lifestyle reasons it may be important for consumers to know whether food products contain animal material and, if so, of which species. To determine animal species, suitable and reliable analytical techniques should be applied, targeting informational biomolecules such as DNA or protein [17]. In order to find generic targets to detect the presence of bird in food products, the main goal of this study, it is necessary to perform a detailed analysis of their molecular evolution in birds and to construct ancestral sequences. An ancestor sequence of avian collagen represents collagen from the past and, compared to sequences of extant birds, will bear a greater resemblance to collagen extracted from fossilized bones of ancient birds and non-avian dinosaur species. Therefore, the approach presented in this study is also relevant to paleontologists investigating the sequence of collagen in fossilized tissues. The treatment conditions in the gelatin production process destroy most of the DNA, severely reducing the sensitivity of DNA-based methods or leading to the occurrence of false negative results [10]. The performance of ligand binding assays (LBA) depends on the 3D structure of analyte proteins [18], which is affected by gelatin production conditions or by food processing in general. Responsible use of LBA in (processed) food analysis may require knowledge of reagent-analyte interactions at the molecular level to assess the suitability of an assay, especially with regard to its selectivity and the potential change in affinity for analyte proteins that have a changed 3D structure after food processing. It is preferable to use bottomup protein LC-MS/MS (Liquid Chromatography-tandem Mass Spectrometry) as a detection technique over DNA-techniques and LBA, because the primary protein structures often remain largely intact during food processing. This is beneficial for reliable detection, especially when smaller target peptides are cleaved from the primary structure. Moreover, the largely intact primary structures of collagen contribute to the intrinsic properties of gelatin, i.e. the enabling of gelatinization. Targets that differ in a single amino acid can easily be distinguished using LC-MS/MS by their retention time, the m/z of precursor ions and/or the m/z of product ions after fragmentation [19]. Several strategies can be applied to detect animal species: one strategy is to select unique targets per species, which are absent in other species; another approach is to find generic targets for a whole group of animal species [20], such as birds. The latter approach was followed in this study. The goal of the study was to identify generic bird targets to add these as modules to the TrustGel™ method [21].
Phylogenetic studies indicate an early emergence of the three fibrillar collagen clades A to C, before the eumetazoan radiation [22]. Therefore collagen type 1 chains, belonging to the A clade, are present in all bird species and provide a good basis for finding a generic bird target. Eumetazoan radiation occurred approximately 530 million years ago [23], well before the evolution of modern birds. Approximately 165-150 million years ago, birds evolved from theropod dinosaurs and continuously served as a vehicle for protein evolution. The mass extinction event at the end of the Cretaceous, 66 million years ago, decimated the number of bird species and lineages. After that, birds explosively diversified in more than 10,000 species today [24][25][26]. The closest extant relatives to birds are, in descending order, crocodiles, turtles and lizards (including snakes and geckos) [27].
Previously, we developed method modules for the individual detection of quantitative porcine and bovine targets [21] and for several fish species [7], amongst others. Generic bird targets should cover as many bird species as possible, but should be different from other animal species. An advantage of a generic target is that fewer transitions need to be added to a targeted, quantitative LC-MS method compared to when species are detected individually, leaving more room for detection of other targets in the same method. A disadvantage is that it cannot be known whether a generic target truly covers an entire group when genetic information is not available for every species in the group and for a sufficient number of individuals, to cover the variation within a species. However, these aspects will also impair the detection of individual species with unique targets. The theoretical target selection was performed using 83 bird sequences, including the bird species most important in food products, several sequences from related animal groups and database searches. Experimental support was provided by analyzing 10 bird species using non-targeted LC-MS/MS. Targets were selected from collagen 1α2, due to 1) the high abundance of collagen type 1 in general and because 2) less (reliable) genetic information could be retrieved from databases for avian collagen 1α1 than for collagen 1α2. Using collagen 1α1 as target source would reduce the quality of the theoretical target selection process, compared to collagen 1α2. Additionally, gene loss has not been reported for collagen 1α2 in birds, in contrast to collagen 1α1 [28]. Ultimately, one generic target and one target combination met the selection criteria. Their presence was experimentally confirmed in extracts from 10 extant bird species, in chicken soup and in chicken broth, as proof of principle for food products. Finally, it was shown how our combined food chemistry and molecular evolution approach could benefit paleoprotein analysis, in the interpretation of collagen MS/MS spectra from ancient species.

Materials and methods
Bird products (ostrich, goose, duck, turkey, chicken, pheasant, guinea fowl, pigeon, partridge, quail, various cuts) were purchased from local supermarkets. Collagen from these birds was extracted by placing several grams per product in milliQ water in an oven at 100˚C for 2 days. After centrifugation, 4 ml extract was added to 3 ml pentane, shaken and centrifuged. After 1 hour at < -70˚C the pentane layer was removed. Aliquots were digested with trypsin without reduction or alkylation because the collagen GXY domain, from which the peptide targets of interest were cleaved, contains no disulfide bridges. Chicken soup was homemade and was sampled by taking an aliquot of the gelatinous part, after the soup had cooled. Chicken broth (brand A (1.3% chicken meat powder) and brand B (3.1% chicken meat powder)) and beef broth (2.3% beef extract) tablets were purchased from local supermarkets. Tablets were dissolved in 100 mM ammonium bicarbonate at 95˚C for 3 hours. After centrifugation, aliquots were digested with trypsin. The beef broth sample served as a negative control.
Samples were analyzed using a combination of a UHPLC (Ultimate 3000, Dionex) and a Q-Exactive mass spectrometer (ThermoElectron). Separation was achieved on an Acquity HSS T3 column (2.1 × 100 mm, 1.8 μm, Waters, Milford, PA, USA) at a temperature of 40˚C with an injection volume of 10 μl. Mobile phases consisted of milliQ water (A) and acetonitrile (B), both containing 0.1% formic acid. A binary gradient from 2% to 30% B was applied at a flow rate of 0.5 ml minute −1 , followed by a column wash and equilibration. The total run time was 18 minutes. All peptides were analyzed using electrospray ionization in positive mode (HESI source) using a full-scan data-dependent method with a range of m/z 200-2000. Other settings were: resolution (35,000), spray voltage (3.0 kV), capillary temperature (320˚C), heater temperature (350˚C), S-Lens RF Level (50 V), AGC target (1e6), and maximum IT (150 ms). The top 5 ions were subjected to data-dependent scans at a normalized collision energy of 15, 25 and 35. XCalibur software version 3 (ThermoScientific) was used for data acquisition. Data analysis was performed manually.
Sequences of 83 avian collagen 1α2 mRNA or cDNA entries were obtained from the NCBI nucleotide database (https://www.ncbi.nlm.nih.gov/nuccore/advanced), accessed in September 2021 (see Table 1). We preferred to use database nucleotide sequences, due to their higher reliability compared to protein sequences. For comparison, crocodile, turtle, snake, mammalian, amphibian and fish collagen 1α2 sequences were added to the set, as well as a constructed common ancestral bird collagen 1α2 sequence. The sequences were translated to protein using Microsoft Excel version 2103. Collagen 1α2 sequences were only included in the data file if the GXY domain was 1014 codons in length (excluding the subsequent GGG triplet) and if there was a glycine codon in each first GXY position, to promote the inclusion of high quality sequences [29]. The sequence of Dromaius novaehollandiae contained missing information, namely GGK at codon position 781 and CCY at position 1001. These codons were adapted to GGT and CCT, respectively, as these were the majority codons at the indicated positions. The bird species in the data set are from 33 orders. A coding DNA estimation of the collagen 1α2 GXY domain of the common bird ancestor was composed using the sequences of one species per order, marked with an asterisk in Table 1. Although there are differences in age between orders, the same weight was provided to each order to calculate the estimation. Of the 1014 positions, 1007 had majority codons (present in 17 or more of the 33 species) that were automatically selected for the ancestral sequence. There were 5 positions with a most abundant non-majority codon, that was selected. Finally, there were 2 positions that exhibited 2 non-majority codons of equal abundance. At position 611 (15x CCT, 15x CCC, 1x CCA and 2x GCC) CCC was selected for the ancestral sequence, because of its slightly higher probability compared to CCT when only single nucleotide changes are considered. At position 863 (16x AAC, 16x AGC and 1x AGT) AGC was selected for the same reason. The constructed common ancestral sequence of birds is reported in Fig 1 and was used to assess the genericity and suitability of bird collagen 1α2 targets.

Theoretical target selection
The set of 83 bird collagen 1α2 sequences that met the selection criteria were visualized in codon, codon group and amino acid usage tables [29], to aid in the identification of a generic peptide (see S1 File). Together with several more distant species, the similarities of the birds' cDNA sequences are shown in Fig 2, by the number of mutual nucleotide differences. Additionally, it was calculated which amino acids were fully conserved, regarding the 83 species. After construction of the common bird ancestral cDNA sequence, see Fig 1, which was also included in the comparison of Fig 2, the ancestral sequence was translated to protein and in silico digested with trypsin, resulting in the formation of 79 peptides containing part of the GXY domain, as summarized in Table 2. Analogue human collagen 1α2 (Uniprot entry P08123) contains 11 amino acids and 1 tryptic cleavage site N-terminal and 15 amino acids and 1 tryptic cleavage site C-terminal of the GXY domain. The selection of generic tryptic targets for birds was performed in two rounds. The first round focused mainly on genericity of the target, unambiguity (meaning the target is present in a single form) and analyzability; the second round was aimed at uniqueness versus non-bird species. The following criteria were applied during the first selection round: A. The peptide target should contain a maximum of 1 N, Q or M residue. Whereas full target unambiguity is highly desirable for quantitative targets, for qualitative targets it may be allowed that the sequence contains an amino acid that can be partially modified, e.g. N or Q (deamidation) or M (oxidation). We chose to avoid peptide targets containing more than one amino acid that could be partially modified, due to the resulting increase in the number of possible forms to be monitored.
B. The length of the peptide should be at least 7 residues to provide sufficient uniqueness versus other species than birds.
C. The peptide target should contain a maximum of 1 amino acid that has not been fully conserved, with respect to the 83 bird species.   D. A maximum of 2 amino acid variations were allowed in the single non-conserved amino acid (see criterion C), again to limit the number of possible forms to be monitored and to reduce the probability that bird species that are not part of the set of 83, would show unexplored variations.
The results of the target selection are summarized in Table 2. For each of the 79 tryptic peptides of the constructed common bird ancestral sequence, the assignment is given in the third column. The assignments consist of a letter A to D, indicating which of the aforementioned criteria was not met (not exhaustive) and a number. For A assignments the number indicates the amount of N, Q or M amino acids in the sequence, for B assignments it indicates the peptide length, for C assignments the number of non-conserved residues in the peptide and for D assignments the number of amino acid variations in the single non-conserved residue. Fully conserved peptides were designated E. Only D2 and E assignments entered the second round of selection. It should be noted that all D2 and E peptides also exhibited fully conserved preceding cleavage sites and absence of P in the amino acid C-terminal to the peptide, which is essential for their release from the primary structure by trypsin digestion. There were a total of 4 D2 and 6 E candidates. In the second round, the candidate peptides were subjected to protein blast search. The applied criterion was that no 100% hits should be obtained with animal species other than bird species. It was expected that hits with non-avian species would be obtained, especially for the E candidates. Hits with many other species were obtained for 5 out of 6 E candidates. However, the E candidate EGPVGFpGADGR had a limited number of hits, only with several species of crocodiles and turtles. It should be noted that p represents hydroxyproline which is often present at the third GXY position. For 3 out of 4 D2 candidates hits with many non-avian species were obtained, while there were no hits for the peptide VGPIGPAGNR. The only two remaining candidates represent positions 375-386 and 387-396 of the GXY domain. Fig 3 illustrates why this part of the sequence is suitable as generic bird target. Positions 375-396 of the bird sequence are more similar to the sequences of crocodiles and turtles than to the sequence of other reported species, as expected from the evolutionary relations. The peptide EGPVGFpGADGR is exactly the same between the constructed common ancestor of birds and the crocodile and turtle sequences shown, but differs from the next  [21], which corresponds to amino acid (without the subscript). Compared to the common ancestor of birds, yellow cells correspond to amino acid differences and orange cells to codon group differences that do not lead to amino acid differences. Tryptic cleavage sites are presented as thick lines between cells.
https://doi.org/10.1371/journal.pone.0279369.g003 closest species groups of snakes, lizards and geckos as well as from the amphibian and fish species. When it can be excluded that a sample contains crocodile or turtle material, by tracing the product's origin, the peptide EGPVGFpGADGR is suitable as generic bird peptide, especially as it is fully conserved considering the 83 bird species listed in Table 1. Ideally, a generic bird peptide should not occur in crocodile and turtle. As can be seen from Table 2, no other fully conserved E peptide is available with respect to the 83 investigated birds, but the VGPIG-PAGNR peptide is the most suitable D2 candidate. First, the constructed common ancestral peptide of birds differs crucially from crocodiles and turtles in that there is an N residue at the 9 th position, whereas an A residue occurs in the crocodile and turtle species. Second, there is an I residue at the 4 th position, which is often T in crocodiles and turtles. As mentioned previously, VGPIGPAGNR is not fully conserved as it occurs in 82 of the 83 bird species investigated. Only in Phasianus colchicus (common pheasant) the P residue at the 6 th position has changed to an A residue, resulting in the sequence VGPIGAAGNR. The combination of VGPIGPAGNR and VGPIGAAGNR can therefore be used to generically investigate the presence of birds, as it covers the whole group and differs from other animal groups.

Experimental assessment
Extracts from ostrich, goose, duck, turkey, chicken, pheasant, guinea fowl, pigeon, partridge and quail were part of the experimental data set. In  Table 3. All relevant chromatograms and MS/MS spectra are reported in S2 File. A complicating aspect of the combination VGPIGPAGNR/VGPIGAAGNR is that it contains asparagine, which can be deamidated [30], especially in highly processed foods, which negatively affects the unambiguity. Therefore, it is necessary to also monitor the deamidated forms, which are 1 Da higher in mass, when examining the presence of bird material in processed food samples, bringing the total number of forms to be monitored to 4. An advantage of VGPIGPAGNR/VGPIGAAGNR is that it is shorter than EGPVGFpGADGR, theoretically making the combination more suitable to detect collagen if it were present in a more hydrolyzed form. It should be emphasized that the peptides were selected from 83 species classified into 33 orders. Since it is not yet possible to obtain a complete overview of sequences of birds and other species, it is advisable to monitor EGPVGFpGADGR besides VGPIGPAGNR/VGPIGAAGNR when the presence of birds is investigated. In addition, (functionally irrelevant) amino acid changes may have occurred in individuals within a species or be fixed in any bird species, exemplified by the pheasant change to VGPIGAAGNR, which will diminish the suitability of a generic target. On the other hand, (processed) food products often contain material from numerous indivuals, which increases the suitability of a generic target. After having determined the presence of EGPVGFpGADGR and VGPIGPAGNR/VGPI-GAAGNR in the extracts of 10 different bird species, we decided to investigate the presence of the same peptides in processed food products: homemade chicken soup, chicken broth tablets and a beef broth tablet as negative control. The results of these analyses are also summarized in Table 3. As expected, VGPIGPAGNR and EGPVGFpGADGR were detected in the chicken food products, but not in the bovine food product. In Fig 5 chromatograms   exact mass and part of the fragment ions and thus can be easily distinguished. The bird peptides were clearly absent in beef broth. Instead, the bovine peptides GETGPAGPAGPIGPV-GAR and GIpGEFGLpGPAGAR were detected, confirming the presence of bovine collagen 1α1 and 1α2 in the beef broth negative control sample. Finally, an MS/MS spectrum of EGPVGFpGADGR ([M+2H] 2+ ions = > m/z 587.778) obtained from chicken broth brand A is presented in Fig 6. The fragment ions were according to expectation: mainly b and y type ions, including water loss related to the N-terminal glutamic acid [31]. All relevant chromatograms and MS/MS spectra are reported in S2 File.

Target coverage
Depending on the level of required genericity and uniqueness versus other animal species, it can be considered to use other peptides from Table 2 during an experiment. As an example the C2-assigned VGApGPAGAR is discussed. The peptide contains 2 non-conserved amino acids, regarding the 83 investigated bird species, both of which exhibit two amino acid variations. The amino acid in the 1 st position is V in 60 species and I in 23 species. The amino acid in the 3 rd position is A in 82 species and G in 1 species, giving a total of 3 forms. The main form VGApGPAGAR occurs in 60 species, including duck; the second most abundant form IGApGPAGAR occurs in 22 species, including chicken. Finally only Phaethon lepturus or white-tailed tropicbird contains IGGpGPAGAR. The combination VGApGPAGAR/ IGApGPAGAR is sufficient to analyze chicken in duck and vice versa when it can be assumed that there are no other species present. When other (bird) species might be present, this combination is not suitable. Moreover, VGApGPAGAR provides hits with many nonbird species (e.g. rat and mouse) and therefore the peptide is not suitable to generically investigate the presence of bird material in food products. In all cases, peptide targets should be Table 3. Results of the experimental assessment of generic bird peptides. The presence of VGPIGPAGNR / VGPIGAAGNR (and/or deamidated) and EGPVGFp-GADGR in several bird samples and negative control beef broth is indicated.

Sample
Detected peptide

VGPIGPAGNR (and/or deamidated) VGPIGAAGNR (and/or deamidated) EGPVGFpGADGR
Ostrich tendon assessed regarding the required genericity and uniqueness, in relation to the goal of the study.
The peptide combination VGApGPAGAR/IGApGPAGAR is a clear example of a protein sequence part that could lead to confusing results when subjected to phylogenetic analysis. In a previous study it was established that changes in the GXY domain of type 1 collagens appear to be mainly the result of genetic drift and that back changes in a species' lineage can also occur within a functionally restricted space [29]. Whilst most of the bird orders investigated in the present study exhibited either the VGApGPAGAR or IGApGPAGAR form, the orders passeriformes, galliformes, apodiformes and anseriformes showed both forms within the species set. This could indicate that the two peptide forms have interchanged more than once during evolution and that the direction has not always been purely divergent. Alternatively, the peptide may not have been fully fixed to either form when the mentioned orders split off, which may persist even in the present, also for single species. Yet, fixation of a mutation is never an end result, exemplified by the white-tailed tropicbird sequence IGGpGPAGAR showing further divergence originating from a different position in the peptide.

Collagen past, present and future
We presented several options to generically detect bird collagen 1α2, based on the current variation in the protein sequence. The divergence process, based mainly on genetic drift [29], will proceed in the future and, therefore, it is expected that more and more bird species will not contain the generic peptides presented in this study at some point in the future. Species will inevitably drift further apart, although this process is very slow. For example, in humans (including the intermediate ancestors to a constructed common Euarchontoglires ancestor) an average of 1.1 nucleotide changes appear to have been fixed in the collagen 1α1 GXY domain (3042 nucleotides) per million years. This amounts to 3.5 × 10 −10 changes per nucleotide per year, excluding back changes [29]. Although the variation in collagen sequences within populations can be high [32], it is very rare for variations to become fixed. Moreover, only part of the nucleotide changes result in amino acid changes, which can be detected by protein LC-MS. The factor between nucleotide and amino acid changes is not constant, reducing the suitability of protein sequences for quantitative assessment of evolutionary relations [29]. Although collagen sequence changes in the more recent past appear to mainly have been governed by genetic drift, it is conceivable that future changes again will be more influenced by selective pressure, e.g. due to a change in external conditions. Such a change would have to be quite drastic to really affect the required collagen properties for tissue structure. Besides functional and informational restriction, the maintenance of code robustness [33] may also play a role during divergence, exemplified by codon usage bias structures. Overall, the current divergence status of bird collagen 1α2 makes it possible to apply generic detection using protein LC-MS. For fish species, however, it was observed that it does not seem feasible to select a comprehensive generic collagen type 1 target, due to the much longer divergence times [7]. The selection of generic peptides for fish species is still possible, but at a lower level, e.g. for fish families. Unfortunately, the taxonomy nomenclature of species types, such as birds or fish, is quite confusing because the organization levels "orders" and "families" do not represent similar divergence times for birds and fish. This effect is especially visible in protein domains that are mainly governed by genetic drift, but can be obscured in domains under high selective pressure.
To aid in finding candidate targets, a bird ancestral sequence was constructed, which estimates the collagen 1α2 GXY sequence of birds in the past, from which the present sequences have diverged. The constructed ancestral sequence may also be useful for identification of collagen in fossilized tissues of birds and related species using paleoprotein analysis [34] as it has been observed that collagens can be preserved for millions of years [11,12]. In a previous study [13], five collagen 1α2 peptide sequences were reported for a specimen of Brachylophosaurus canadensis (age 80 million years), a hadrosaur species, see Table 4.
Three of the five peptide sequences are exactly the same between Brachylophosaurus canadensis and the constructed common bird ancestor, and for one peptide there was a single threonine-alanine difference. The bottom peptide from Table 4 exhibited two differences, leucineproline and proline-isoleucine. These types of changes are not unexpected in collagen 1α2. In a previous study, we found that, when amino acid changes occur, a selection of codon groups, such as A, P, V, T, S 1 , and I, is predominantly involved in changes between closely related 1α2 collagens [35], as part of a larger change infrastructure. Leucines are slightly less involved. Again, hydroxyproline is often present at the third GXY position instead of proline. Therefore, we further investigated the reported MS/MS spectrum for Brachylophosaurus canadensis Table 4. Evaluation of reported Brachylophosaurus canadensis collagen 1α2 peptides. The corresponding sequences from the constructed common ancestor of birds are in the right column. Amino acid differences are highlighted in yellow.

B. canadensis 1α2
Bird common ancestor 1α2 GLPGESGAVGPAGPpGSR [12]. It was deduced that, depending on the obtained resolution, our constructed ancestral bird sequence GPpGESGAVGPAGPIGSR (which has exactly the same mass as the reported GLPGESGAVGPAGPpGSR), may fit the MS/MS data better than the original assignment, as it would explain the ions observed at nominal m/z 1424. These ions were assigned as "Potentially co-eluting contaminating ion" but could be assigned as y 16 + ions of GPpGESGAVGPAGPIGSR. The unassigned peak at nominal m/z 712 could then represent y 16 2+ ions of GPpGESGAVGPAGPIGSR. This finding indicates that the construction of ancestral sequences using sequences of extant species could be helpful in the structural elucidation of paleoproteins, linking the research fields food chemistry, molecular evolution and paleontology, and providing a strong combination of disciplines to support paleontology in the 21 st century and beyond.

Conclusion
Generic LC-MS bird targets were identified after theoretical target selection, using a set of 83 bird collagen 1α2 sequences of 33 orders and a constructed common ancestral sequence, followed by experimental assessment of extracts from 10 bird species. Two tryptic target peptides passed the selection citeria, aimed at genericity, unambiguity, analyzability and uniqueness vs. non-bird species. The combination of VGPIGPAGNR and VGPIGAAGNR (pheasant only) covers all the investigated birds and was not found in other species using protein blast. It should be noted that it is necessary to monitor the deamidated forms in combination with the unmodified forms, as deamidation of N can occur during food processing. The peptide EGPVGFpGADGR covers all the investigated birds, but also occurs in several species of crocodiles and turtles. Only when the presence of the latter species can be excluded, the peptide is suitable as generic bird target. The presence of the generic peptide (combination) was confirmed in chicken soup and broth, with beef broth as negative control sample, providing proof of principle in food products. The constructed common ancestral bird sequence was also used to evaluate elucidated dino sequences, demonstrating that the use of ancestral sequences could be helpful in paleoprotein analysis.
Supporting information S1 File. Calculations and data. Calculations and data regarding the generic targets, the common bird ancestor and the distance table.